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Abstract 

Many systems, ranging from biological and engineering systems to social systems, can 
be modeled as directed networks, with links representing directed interaction between 
two nodes. To assess the importance of a node in a directed network, various centrality 
measures based on different criteria have been proposed. However, calculating the cen- 
trality of a node is often difficult because of the overwhelming size of the network or the 
incomplete information about the network. Thus, developing an approximation method 
for estimating centrality measures is needed. In this study, we focus on modular networks; 
many real-world networks are composed of modules, where connection is dense within a 
module and sparse across different modules. We show that ranking- type centrality mea- 
sures including the PageRank can be efficiently estimated once the modular structure of a 
network is extracted. We develop an analytical method to evaluate the centrality of nodes 
by combining the local property (i.e., indegree and outdegree of nodes) and the global 
property (i.e., centrality of modules). The proposed method is corroborated with real 
data. Our results provide a linkage between the ranking-type centrality values of modules 
and those of individual nodes. They also reveal the hierarchical structure of networks 
in the sense of subordination (not nestedness) laid out by connectivity among modules 
of different relative importance. The present study raises a novel motive of identifying 
modules in networks. 
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1 Introduction 



A variety of systems of interacting elements can be represented as networks. A network is a 
collection of nodes and links; a link connects a pair of nodes. Generally speaking, some nodes 
play central functions, such as binding different parts of the network together and controlling 
dynamics in the network. To identify important nodes in a network, various centrality measures 
based on different criteria have been proposed [1-3]. 

Links of many real networks such as the World Wide Web (WWW), food webs, neural net- 
works, protein interaction networks, and many social networks are directed or asymmetrically 
weighted. In contrast to the case of undirected networks, a link in directed networks indicates 
an asymmetrical relationship between two nodes, for example, the control of the source node of 
a link over the target node. The direction of a link indicates the relative importance of the two 
nodes. Central nodes in a network in this sense would be, for example, executive personnels 
in an organizational network and top predators in a food web. Generally, more (less) cen- 
tral nodes are located at an upper level (a lower level) in the hierarchy of the network, where 
hierarchy refers to the distinction between upper and lower levels in terms of the centrality 
value as relevant in, for example, biological [4, 5] and social [6] systems. This type of centrality 
measure is necessarily specialized for directed networks and includes the popularity or prestige 
measures for social networks [1] , ranking systems for webpages such as the PageRank [7, 8] 
and HITS [9,10], adaptations of the PageRank to citation networks of academic papers [11,12] 
and journals [11, 13-15], and ranking systems of sports teams [16]. We call them ranking-type 
centrality measures. 

Under practical restrictions such as overwhelming network size or incomplete information 
about the network, it is often difficult to exactly obtain ranking-type centrality values of nodes. 
In such situations, the simplest approximators are perhaps those based on the degree of nodes 
(i.e., the number of links owned by a node). For example, the indegree of a node can be an 
accurate approximator of the PageRank of websites [17] and ranks of academic journals [14,15]. 
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However, such local approximations often fail [18-20] , implying a significant effect of the global 
structure of networks. 

A ubiquitous global structure of networks that adversely affects local approximations is the 
modular structure. Both in undirected [21-24] and directed [1,24-27] networks, nodes are often 
classified into modules (also called communities) such that the nodes are densely connected 
within a module and sparsely connected across different modules. In modular networks, some 
modules may be central in a coarse-grained network, where each module is regarded as a 
supernode [28]. However, relationships between the centrality of individual nodes and that of 
modules are not well understood. Using these relationships, we will be able to assess centralities 
of individual nodes only on the basis of coarse-grained information about the organization of 
modules or under limited computational resources. 

In this study, we analyze the ranking-type centrality measures for directed modular net- 
works. We are concerned with the modular structure of the network in the meaning of parti- 
tioning of the network into parts, and not the overlapping community structure [22,23,25]. We 
determine the centrality of modules, which reflects the hierarchical structure of the networks 
in the sense of subordination [4-6], not nestedness [30-33]. Then, we show that module mem- 
bership is a chief determinant of the centrality of individual nodes. A node tends to be central 
when it belongs to a high-rank module and it is locally central by, for example, having a large 
degree. To clarify these points, we analytically evaluate centrality in modular networks. On 
the basis of the matrix tree theorem, the centrality value of a node is derived from the number 
of spanning trees rooted at the node. We use this relationship to develop an approximation 
scheme for the ranking-type centrality values of nodes in modular networks. The approxi- 
mated value turns out to be a combination of local and global effects, i.e., the degree of nodes 
and the centrality of modules. For analytical tractability, we formulate our theory using the 
ranking-type centrality measure called the influence, but the results are also applicable to the 
PageRank. We corroborate the effectiveness of the proposed scheme using the Caenorhabditis 
elegans neural network, an email social network, and the WWW. 
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2 Ranking-type Centrality Measures 

We consider a directed and weighted network of iV nodes denoted by G = {V,E}. A set of 
nodes is denoted by V = {1, ...,7V}, and E is a set of directed links, i.e., node % sends a 
directed link to node j with weight if and only if (i,j) G E. The weight represents the 
amplitude of the direct influence of node i on node j. We set Wij = when ^ E. 

Depending on applications, different centrality measures can be used to rank the nodes 
in a network. We analyze the effect of the modular structure on ranking of nodes using a 
centrality measure called influence because it facilitates theoretical analysis. The existence of 
a one-to-one mapping from the influence to the PageRank [7,8, 17, 18] and to variations of the 
PageRank used for ranking academic journals and articles [11, 13-15], which we will explain in 
this section, enables us to adapt our results to the case of such ranking-type centrality measures. 
To show that our results are not specific to the proposed measure, we study the influence and 
the PageRank simultaneously. 

We define the influence of node i, denoted by Vi, by the solution of the following set of N 
linear equations: 

m= y J , (1<*<A0, (i) 

where kf 1 = Y2f>=i w j'i * s the indegree of node i, and Yl i= i v i = 1 provides the normalization. 
Vi is large if (i) node i directly affects many nodes (i.e., many terms probably with a large 
on the RHS of Eq. ([1])), (ii) the nodes that receive directed links from node i are influential 
(i.e., large Vj on the RHS), and (iii) node i has a small indegree. 

Equation ([1]) is the definition for strongly connected networks; G is defined to be strongly 
connected if there is a path, i.e., a sequence of directed links, from any node i to any node j. If 
G is not strongly connected, there is no path from a certain node i to a certain node j. Then, 
node i cannot influence node j even indirectly, and the problem of determining the influence of 
nodes is decomposed into that for each strongly connected component. Therefore, we assume 
that G is strongly connected. 
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The influence Vi represents the importance of nodes in different types of dynamics on net- 
works (see Appendix A for details). Firstly, Vi is equal to the fixation probability of a new 
opinion introduced at node % in a voter- type interacting particle system [20]. Secondly, if all 
links are reversed such that a random walker visits influential nodes with high probabilities, Vi 
is the stationary density of the continuous-time simple random walk. Thirdly, is the so-called 
reproductive value used in population ecology [29,34]. Fourthly, v-i is the contribution of an 
opinion at node i to the opinion of the entire population in the consensus in the continuous-time 
version of the DeGroot model [35-37]. Fifthly, V{ is equal to the amplitude of the collective 
response in the synchronized dynamics when an input is given to node i [38]. 

The influence can be mapped to the PageRank. The PageRank, denoted by Ri for node i, 
is defined self-consistently by 

N 

^ = |f + (1-?)E|^ + 5 k ^ (l-q)R i , (1 < < < JV) (2) 

j=l 3 

where k° nt = Ylj>=i w ij' ^ s the outdegree of node i, 8^ — 1 if i — j, and Sij = if i ^ j. The 
second term on the RHS of Eq. ([2]) is present only when k° nt > 0. Note that the direction of 
the link in the PageRank has the meaning opposite to that in the influence; Ri of a webpage 
is incremented by an incoming link (hyperlink), whereas Vi is incremented by an outgoing link. 
The introduction of q > homogenizes Ri and is necessary for the PageRank to be defined for 
directed networks that are not strongly connected, such as real web graphs. The normalization 
is given by ^2f =l Ri = 1- Ri is regarded as the stationary density of the discrete-time simple 
random walk on the network [7,8, 17], where q is the probability of a jump to a randomly 
selected node. 

An essential difference between the two measures lies in normalization. In the influence, 
the total credit that node j gives its neighbors is equal to J2iLi w ij v j = ^T v ^ wn ile that in 
the PageRank is equal to YliLi { w ji/kj nt ) Rj = Rj- ^ n the PageRank, the multiplicative factor 
of the total credit that node j gives other nodes is set to J2iLiWji/kj nt = 1 to prevent nodes 
with many outgoing links from biasing ranks of nodes. In the ranking of webpages, creation 
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of a webpage i with many hyperlinks does not indicate that node i gives a large amount of 
credit to recipients of a link. Each neighbor of node i receives the credit Ri/k° nt from node 
i. We should refer to the PageRank when nodes can select the number of recipients of credit 
(e.g., the WWW and citation-based ranking of academic papers and journals). We should use 
the influence when the importance of all links is proportional to their weights (e.g., opinion 
formation and synchronization mentioned above). 

The PageRank is equal to the influence in a network modified from the original network G 
(see Appendix B for derivation). In particular, the PageRank in G for q = is given by 



where u$ (G rcv ) is the influence of node i for the network G rcv , which is obtained by reversing 
all links of G. We use this relation to extend our results derived for the influence to the case 
of the PageRank. 

The influence has a nontrivial sense only in directed networks because Wij = Wji in Eq. ([TJ 
leads to v { = 1/N [20,39,40]. Furthermore, any network with kf = k° nt (1 < i < N) 
results in Vi = 1/N. Therefore, from Eq. ([3]), Ri = kf 1 / ((k) N) for such a network, where 
(k) = ^f =1 k] n /N = J2f=i k° nt /N is the mean degree. In this case, Vi and Ri are not affected 
by the global structure of the network. 

In directed or asymmetrically weighted networks, Vi and Ri are heterogeneous in general. 
The mean-field approximation (MA) is the simplest ansatz based on the local property of a 
node. By using J2f=i w ij v j ~ ^2f=i w ^ = k° ut v, where v = J2iLi V %/N = 1/N, we obtain u$ oc 
k^/kf 1 . Combination of this and Eqs. © yields the MA for the PageRank: Vi « kf 1 / ((k) N). 

We can calculate Vi by enumerating spanning trees. To show this, note that Eq. ([1]) implies 
that Vi is the left eigenvector with eigenvalue zero of the Laplacian matrix defined by La = 
Ylj'=i w j'i and L ij = -wji (i ^ j), i.e., 



Ri = k^Vi (G rev ) 



(3) 



N 





(4) 



i=l 
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The cofactor of L is defined by 

Co (i, j) = (-1)^ det L(i,j), (5) 

where L(i,j) is an (N — 1) x (N — 1) matrix obtained by deleting the z-th row and the j-th 
column of L. Because J2f=i Lij — 0, (1 < i < N), Co (i, j) does not depend on j. Using Eq. ([5]) 
and the fact that L is degenerate, we obtain 

N N 

^2Co(i,i)L i:j = ^Co(z',j)Z^ 

i=l i=l 

= detL = 0, {l<j<N). (6) 

Therefore, (Co (1, 1) , . . . , Co (N, N)) is the left eigenvector of L with eigenvalue zero, which 
yields 

Vi oc Co (i, i) = det L (i, i) . (7) 

From the matrix tree theorem [41,42], det L (i, i) is equal to the sum of the weight of all possible 
directed spanning trees rooted at node i. The weight of a spanning tree is equal to the product 
of the weight of N — 1 links forming the spanning tree. 

3 Centrality in Modular Directed Networks 

Most directed networks in the real world are more structured than those captured by the MA. 
A ubiquitous global structure of networks is modular structure. Modular networks consist of 
several densely connected subgraphs called modules (also called communities), and modules are 
connected to each other by relatively few links. As an example, a subnetwork of the C. elegans 
neural network [43,44] containing 4 modules is shown in Fig.[]Ja). Modular structure is common 
in both undirected [21-24] and directed [24-27] networks. 

Modular structure of directed networks often leads to hierarchical structure. By hierarchy, 
we refer to the situation in which modules are located at different levels in terms of the value 
of the ranking-type centrality. It is relatively easy to traverse from a node in an upper level to 
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one in a lower level along directed links, but not vice versa. The hierarchical structure leads to 
the deviation of Vi from the value obtained from the MA. 

As an example, consider the directed P-partite network shown in Fig. [2j Layer P' (1 < P' < 
P) contains N/P nodes, where N is divided by P. The nodes in the same layer are connected 
bidirectionally with weight w. Each node in layer P' (1 < P' < P — 1) sends directed links 
to all nodes in layer P' + 1 with weight unity, and each node in layer P' (2 < P' < P) sends 
directed links to all the N/P nodes in layer P' — 1 with weight e. The following results do 
not change if two adjacent layers are connected via just an asymmetrically weighted bridge, as 
shown in Fig. H(b). Because of the symmetry, all nodes in layer P' have the same influence Vpi. 
From Eq. (JTJ, we obtain 

€ F '-H\ -i)P 

= (l-S)N ■ <»> 
When e < 1, a node in a layer with small P' is more influential than a node in layer with large 
P'. The MA yields 

k ont 



k) 



e , (node i G layer 1) 

e, (node i G layer P) (9) 
1, (otherwise) 



The actual Vp> decreases exponentially throughout the hierarchy, whereas k° nt /k^ does not. 
We observe a similar discrepancy in the case of the PageRank. 

We develop an improved approximation for the influence in modular networks by combining 
the MA and the correction factor obtained from the global modular structure of networks. 
Consider a network of m modules Mj (1 < / < m). For mathematical tractability, we assume 
that each module communicates with the other modules via a single portal node I p G Mj, as 
illustrated in Fig. QJb); the network shown in Fig. DJb) is an approximation of that shown in 
Fig. Ufa). We denote the weight of the link (I p , J p ) by wi_>j (I ^ J). 

We obtain Vi in this modular network by enumerating spanning trees rooted at node i G Mj. 
Denote such a spanning tree by T. The intersection of T and Mj is a spanning tree restricted 
to Mi and rooted at node i. This restricted spanning tree reaches all nodes in Mi. T enters 
Mj (J 7^ J) via a directed path from node I p to node J p . This path is provided by a spanning 
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tree in the network of m modules, where each module is represented by a single node. The 
other nodes in Mj are spanned by the intersection of T and Mj, which forms a spanning tree 
restricted to Mj and rooted at node J p . Therefore, T is a concatenation of (i) an intramodular 
spanning tree in Mj and rooted at node i, (ii) m — 1 intramodular spanning trees in Mj and 
rooted at node J p (J ^ 7), and (iii) a spanning tree in the network of m modules rooted at 
node I p . Let Mi{Mi) (£ for local) denote the number of spanning trees in Mj with an arbitrary 
root, and M g (g for global) denote the number of spanning trees in a network of m modules 
with an arbitrary root. Then, the number of spanning trees in G rooted at node i is equal to 



■AAA, ( 10 ) 



_j=i,j^i 

where vf is the influence of node i e Mi within Mj and v 9 M is the influence of Mj in the network 
of m modules. The first, second, and third factors in Eq. ffTUl) corresponds to the numbers of 
spanning trees of types (i), (ii), and (iii), respectively. Therefore, we obtain 

«i««ff n <W en) 

For nodes %,%' G Mi, Eq. (TIT]) yields Vifvy = vf/vf,; the relative influence of nodes in the 
same module is equal to their relative influence within the module. For nodes in different 
modules, i.e., node i in module Mj and node j in module Mj (J ^ J), Eq. (TTTi) leads to 

Vi _ v ^mA n9 , 

vj vy M y Ip 

If each module is homogeneous, we approximate vf ~ Vj ~ u j p and obtain Uj/uj ~ v a Mi jv a Mj \ 
the global structure of the network laid out by links across modules determines the influ- 
ence of each node. If each module is heterogeneous in degree, we use the MA, i.e., vf ~ 

(^ ut /^ n )/E, ; node^ A / 7 (^ U V^) and v\ » {kf "/kf)/ E^de ^W^W" ^ assum " 
ing that J p (J p ) is a typical node in Mj (Mj), we set t>f p « l/Ei' ; node i'eM, (K^l^f) and 
*4 « VEjVjnode j'eMj^T'/kf)- Then > E( l- dH is transformed into 

^ (kTIK) v 9 Ml 
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(13) 



Therefore, we define an approximation scheme, called the MA-Mod, for node % in module Mj 

as 

jLOUt 

«« « (14) 

Equation (j!4p can be used for general modular networks in which different modules can be 
connected by more than one links. 

Two crucial assumptions underlie Eq. (1141) . Firstly, a module is assumed to be an uncor- 
rected and possibly heterogeneous random network so that the MA is effective within the 
module. Note that the degree of nodes can be heterogeneously distributed. Secondly, most 
links are assumed to be intramodular so that the local MA is simply given by v\ oc /c° ut /fc™. 

To obtain v 9 Mj for general networks, we define wi^j = ^2 ieMl j £Mj Wij and approximate 
Vi ~ v g Mi / Ym'=i Npv 9 M t (node % G Mr), where Nj/ is the number of nodes in Mp. Then, 
^ i=1 Vi — 1 is satisfied. Equation ([jQ) is transformed into 

4, = 5«!^, (l</<m), (15) 

where 

Equation (fl5i) has the same form as Eq. (Til). By solving the set of m linear equations, we obtain 

V Mj- 

Equation flU} adapted for v % (G rcv ) leads to v % (G rcv ) « (h^/kf ut )v 9 Ml (G rcv ). By combining 
this with Eq. ([3]), we obtain the MA-Mod scheme for the PageRank with q = 0: 

k in 

Ri « (17) 

where 

4 Application to Real Data 

We examine the effectiveness of the MA-Mod scheme using three datasets from different fields. 
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4.1 Neural network 

In the network of nematode C. elegans, a pair of neurons may be connected by chemical 
synapses, which are directed links, or gap junctions, which are undirected links. We calculate 
the influence of neurons on the basis of a connectivity dataset [43,44]. The link weight Wij is 
assumed to be the sum of the number of chemical synapses from neuron i to neuron j and that 
of the gap junctions between neuron % and neuron j. The following results are qualitatively the 
same if we ignore the gap junction or the link weight (see Appendix C for the results). The 
largest strongly connected component, which we simply call the neural network, contains 274 
nodes and 2959 links. 

It is difficult to determine whether the influence or the PageRank is more appropriate 
from current biological evidence. If postsynaptic neurons linearly integrate different synaptic 
inputs, the influence may be an appropriate measure. In contrast, postsynaptic neurons may 
effectively select one synaptic input by a nonlinear mechanism. If each input is selected with 
the same probability in a long run and the activity level does not differ much across neurons, 
the PageRank may be appropriate. We examine both scenarios using power iteration (see 
Appendix D for the methodology). 

Among 274 neurons, 54, 79, and 87 neurons are classified as sensory neurons, interneurons, 
and motor neurons, respectively [44]. By definition, sensory neurons directly receive external 
input such as touch and chemical substances, motor neurons send direct commands to move the 
body, and interneurons mediate information processing in various ways. The other neurons are 
polymodal neurons or neurons whose functions are unknown. Neurons with a large V{ are mostly 
sensory neurons. For example, among the 10 neurons with the largest Vi, 8 are sensory neurons 
(ALMR, ASJL, ASJR, AVM, IL2VL, PHAL, PHAR, PVM) and 2 are interneurons (AIML, 
AIMR). Generally speaking, these neurons have a large i>j not simply because their kf^/kf 1 is 
large. The average of Vi/[(k° nt /kf 1 )/ J2f =1 (k° ut / k 1 ^)} over the 10 neurons is equal to 3.456 (see 
Tab. IA2I in Appendix C for the values for individual neurons). These neurons are located at 
upper levels of the neural network in the global sense. The conclusion remains qualitatively the 
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same if we use Ri(G TCV ). Recall that the PageRank is calculated for G rcv because the meaning 
of the direction of the link in the influence is opposite to that in the PageRank. 

The average values of Vi (Ri(G rcv )) for sensory neurons, interneurons, and motor neurons 
are equal to 0.009235 (0.006621), 0.003614 (0.005415), and 0.001032 (0.001323), respectively. 
The cumulative distributions of Vi for different classes of neurons are shown in Fig. [31 Even 
though many synapses from motor neurons to interneurons and sensory neurons, and synapses 
from interneurons to sensory neurons exist, these numerical results indicate that the neural 
network is principally hierarchical. Generally speaking, sensory neurons, which directly receive 
external stimuli, are located at upper levels of the hierarchy, motor neurons are located at lower 
levels, and interneurons are located in between. Sensory neurons serve as a source of signals 
flowing to interneurons and motor neurons down the hierarchy. 

The relation between v\ and the MA is shown in Fig. Ufa) by the squares. They appear 
strongly correlated. However, the Pearson correlation coefficient (PCC; see Appendix E for 
definition) between V{ and the MA is not large (= 0.5389), as shown in Tab.Q], because Vi tends 
to be larger than the MA for nodes with a large i>$. Note that the data are plotted in the log-log 
scale in Fig. HI 

The neural network has modular structure [45]. To use the MA-Mod scheme (Eq. (1141) ). 
we apply a community detection algorithm [27] to the neural network. We have selected this 
algorithm [27] because a directed link in the present context indicates the flow rather than 
the connectedness on which a recent algorithm [26] is based. As a result, we obtain m — 13 
modules, calculate v g M , . . ., v 9 M from the network of the m modules, and use Eq. (fl4j) . v t is 
plotted against the MA-Mod in Fig. H^a), indicated by circles. The data fitting has improved 
compared to the case of the MA, in particular for small values of V\. The PCC between Vi and 
the MA-Mod is larger than that between V{ and the MA (Tab. [I]). In this example, this holds 
true for the raw data and the logarithmic values of the raw data. As a benchmark, we assess 
the performance of the global estimator Vi ~ v 9 M J Y^r=i ^i ,v m , ( n °de i G Mj), which we 
call the Mod. The Mod ignores the variability of Vi within the module and is exact for networks 
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Table 1: The Pearson correlation coefficient (PCC) between the centrality measures and differ- 
ent estimators. 



network 


C. e 


legans 


Email 


WWW 


N 


274 


9079 


53968 


m 


13 


637 


2977 


centrality 


Vi 


Ri(G rev ) 


Vi 


Ri(G™) 


v t (G^) 


Ri 


Ri 


Q 


N/A 





N/A 





N/A 





0.15 


MA 


0.5389 


0.3593 


0.5066 


0.3997 


0.0073 


0.0007 


0.2162 


Mod 


0.2927 


0.4346 


0.5010 


0.2452 


0.0003 


-0.0003 


0.4104 


MA-Mod 


0.7295 


0.5005 


0.5066 


0.2671 


0.0000 


0.0000 


0.3166 


MA (log) 


0.8024 


0.7073 


0.3636 


0.5353 


0.3109 


0.1289 


0.4627 


Mod (log) 


0.5195 


0.5503 


0.8075 


0.7147 


0.7800 


0.7147 


0.4098 


MA-Mod (log) 


0.8736 


0.8252 


0.8798 


0.9022 


0.7964 


0.7812 


0.6256 



with completely homogeneous modules, such as the network shown in Fig. [2j The performance 
of the Mod is poor in the neural network, as indicated by the triangles in Fig. H^a) and the 
PCC listed in Tab. ffl 

The values of the PCC between the actual and approximated Ri(G rev ) are also listed in 
Tab. [TJ The results for the PageRank are qualitatively the same as those for the influence. With 
both measures, the module membership is a crucial determinant of centralities of individual 
nodes. Note that, on the basis of the Mod for the influence given by 

v k (1Q) 



L/'=l N I' V M P 

the Mod for the PageRank is given by 

UG™) R 9 Mj (G 



i.e. 



Urn Lin V^m N,,1) 9 ' 



k?R 9 Ml {G™) R 9 Mi (G rcv ) 



(20) 



RAG TCV ) oc 1 ™r w -. (21) 

v ' kf Nj v ' 

We approximate kf" by kf/Nj because the information about local degree is unavailable for 
the Mod. 
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4.2 Email social network 

Our second example is the largest strongly connected component of an email social network [46] . 
A directed link exists between a sender and a recipient of an email. The network has modular 
structure [25]. In the weighted network that we consider here, the link weight is defined by the 
number of emails. The following results do not qualitatively change even if we neglect the link 
weight (see Appendix F). The largest strongly connected component has 9079 nodes and 23808 
links and is partitioned into 637 modules. 

Whether the influence or the PageRank is appropriate for ranking nodes depends on the 
assumption about human behavior. If recipients spend the same amount of time on each 
incoming email (i.e., the link of weight unity), Vi is relevant. In contrast, recipients may have 
a fixed amount of time for dealing with all incoming emails. Then, a recipient may equally 
distribute the total time available to each email depending on the number of incoming emails. 
Under this assumption, the PageRank is relevant. We analyze both vi and Ri(G rev ). 

In Fig. H](b), the values of Vi are plotted against those obtained by different estimators. On 
the log-log scale, the MA- Mod performs considerably better than the MA. Remarkably, even the 
Mod, in which nodes in the same module share an estimated centrality value, performs better 
than the MA. This is a strong indication that the structure of the coarse-grained network of 
modules is a more important determinant of u$ than the local structure (i.e., degree) in this 
example. The values of the PCC summarized in Tab. [T] support our claim. The PCC for the 
MA-Mod and the Mod is considerably larger than that for the MA on the logarithmic scale, 
which implies that the MA-Mod is especially effective for nodes with small V{. The values of 
the PCC between Ri(G rcv ) and the different estimators are listed in Tab. [TJ These results are 
qualitatively the same as those for ?v 

4.3 WWW 

Our last example is the largest strongly connected components of a WWW dataset [47]. The 
original network contains 325729 nodes and 1469680 links, and the largest strongly connected 
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component contains 53968 nodes and 296229 links. The MA fits the PageRank (with q > 0) 
of some WWW data when nodes of the same degree are grouped together [17] but not other 
data [18]. Because of the modular structure of the WWW [25], the MA- Mod is expected to 
perform better than the MA. 

In Fig. H](c), Ri for q = is plotted against the MA, Mod, and MA-Mod. For nodes with 
small PageRanks, the MA-Mod, and even the Mod, are considerably better correlaed with Ri 
than the MA is (note the use of the log-log scale in Fig. Hfc); also see Tab. [1]). These nodes 
are located at lower levels of hierarchy. The results are qualitatively the same if we use the 
influence (Tab. [T]). Note that we reverse the links and calculate Vi(G rcv ) because a directed link 
in the WWW indicates an impact of the target node on the source node. 

The MA-Mod for the PageRank can be extended to the case q > 0. From Eq. (j2J), the MA 
for the PageRank is given by 

a k' m ( ?+(l-<?)m 

* " £ + < x " 4* = * ' (22) 

which implies that kf 1 in the MA for q = is replaced by q (k) + (1 — q)kf l for general q. We 
define the MA-Mod for q > by 

\q (k) + (1 - q) kf\ R 9 Mr , 
q J2j k°j / m + (i - <l) k i 
Note that this ansatz is heuristic, whereas Eq. (j!7p used for q = has an analytical basis. The 

PCCs between the PageRank with q — 0.15 and the three estimates are listed in Tab. [TJ The 
MA-Mod performs better than the MA. The advantage of the MA-Mod over the MA is smaller 
for q = 0.15 than for q = because a larger q implies a heavier neglect of the network structure. 

The definition of the PageRank given by Eq. ([2]) is not continuous with respect to the 
outdegree; the term <5 fc out (l — q)Ri is present for /c° ut = (i.e., dangling node) and absent for 
fcout > q Therefore, dangling nodes can have large PageRanks. To improve the MA-Mod for 
q > 0, we should separately treat dangling nodes and other nodes in the same module. We do 
not explore this point because this situation seems to be specific to the working definition of 
the PageRank. 
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In practice, nodes with a small Ri could be irrelevant to the performance of a search engine, 
which outputs a list of websites with the largest PageRanks. However, nodes with small PageR- 
anks constitute the majority of a network when the PageRank follows a power-law distribution. 
This is the case for the real WWW data, which are scale-free networks [17,18]. Our method is 
considerably better than the MA especially for nodes with small PageRanks. 

In general, the WWW is nested, with each level defined by webpages, directories, hosts, and 
domains. At the host level, for example, most links are directed toward nodes within the same 
host [48]. Therefore, a host can be regarded as a module in the network. By calculating the 
importance of the host, called the BlockRank, the PageRank can be efficiently computed [48]. 
In spirit, our R 9 M is similar to the BlockRank, although our R 9 M is used for identifying the 
hierarchical levels of networks and systematically approximating Ri. 

It should be noted that, in general, our approximation scheme runs much faster than the 
direct calculation of Vi or Ri for large networks. This is because the community detection 
algorithm [27] is fast and the power iteration used for calculating and Ri converges faster 
for a smaller network in most (but not all) cases. In the WWW, which is a large network, our 
approximation scheme for the PageRank with q = ran more than 100 times faster than the 
direct calculation on our computer. 

5 Discussion and Conclusions 

We have shown that the hierarchical structure of directed modular networks considerably affects 
ranking-type centrality measures of individual nodes. Using the information about connectiv- 
ity among modules, we have significantly improved the estimation of centrality values. Our 
theoretical development is based on the measure that we have proposed (i.e., influence), but 
the conclusions hold true for both the influence and the PageRank. Our method can be im- 
plemented for variants of the PageRank including the eigenfactor [14, 15] and the so-called 
invariant method [11,13] used for ranking academic journals. 

The hierarchy discussed in this study is different from the nestedness of networks. Many 
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networks are hierarchical in the sense that they are nested and have multiple scales [30-33]. 
A modular network is hierarchical in this sense, at least to a limited extent; two hierarchical 
levels are defined by the scale of the entire network and that of a single module. In contrast, we 
are concerned with hierarchical relationships among modules defined by the directionality of 
networks. This concept of hierarchy has been studied for, for example, food webs [4], transcrip- 
tion networks [5], and social dynamics [6], but its understanding based on networks is relatively 
poor in spite of its intuitive appeal. The influence and the PageRank quantify the hierarchical 
position of individual nodes and of modules. 

In real networks, nodes and links are subjected to changes. Such changes affect nodes 
near the perturbed nodes, but may not significantly affect modules. In social networks, large 
groups change slowly over time as compared to small groups [23]. In addition, in the absence of 
complete knowledge of networks, modest understanding of networks at the level of the modular 
structure may be adequate. Nodes in a module may also have a common function. These are 
main reasons behind investigating the modular structure of networks. We have shown that the 
modular structure is also important in the context of directed networks, hierarchy, and ranking. 
The definition of module is complex in the case of directed networks as compared to undirected 
networks, and module detection in directed networks is currently under investigations (see [24] 
for a review). We hope that our results aid the development of the concept of modules and 
related algorithms in directed networks. 
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Appendix A: Influence is obtained from various dynamical 
models on networks 

Fixation probability of evolutionary dynamics 

Vi represents the probability that an 'opinion' introduced at node % spreads to the entire network. 
We consider stochastic competitive dynamics between two equally strong types of opinions 
A and B; each node takes either A or B at a given time. In the so-called link dynamics 
(LD) [39,40], which is a network version of the standard voter model, one link (z,j) G E is 
selected for reproduction with an equal probability in each time step. Then, the type at node 
% replaces that at node j. This process is repeated until A or B takes over the entire network. 

Vi coincides with the fixation probability denoted by F^ D , which is the probability that a 
new type A introduced at node % in the network of the resident type B nodes takes over the 
entire network [20]. To calculate F^ D , fix a network and consider the initial configuration in 
which A is located at node % and B is located at the other N — 1 nodes. In the first time step, 
one of the following events occurs. With the probability Wij/ j iW i'j'i the nm< (hJ) ^ E 
is selected for reproduction. Then, type A is located at nodes % and j. Let F}^-, denote the 
fixation probability of type A for this new configuration. With the probability w^j ., u^y , 
the link G E is selected, type A becomes extinct, and the dynamics terminates. With 
the remaining probability J2i>^ij>^i w vj' I Ylv j' w i'j'i t ne configuration of types A and B on the 
network does not change. Therefore, we obtain 

= V v^-^ffi} + x + ^M^Z^ld (24) 

Because Fj^ j} = if D + F^ [20], Eq. (JMD leads to Eq. (pQ) with Vi replaced by if D . 
Continuous- time simple random walk 

Consider a simple random walk on the network in continuous time. In a small time interval 
At, a walker at node i is attracted to its neighbor j, where (j, i) G E, with the probability At. 
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Note that the direction of the link is opposite to the convention because the directed link in 
the present study indicates the influence of the source node of the link on the target node of 
the link. The master equation for the density of the random walker at node i, denoted by Ff^ 
(1 < i < N), is represented by 



Because the network G is strongly connected, converges to the unique stationary density 
By setting the LHS of Eq. to 0, we obtain F™ = u<. 

The simple random walk is closely associated with the fixation problem. The so-called dual 
process of the LD is the coalescing random walk. In the coalescing random walk, each of the 
iV walkers basically performs the continuous-time simple random walk on the network with 
the direction of all links reversed. Therefore, the random walker can traverse from node i to 
node j when (j, i) G E. If two random walkers meet on a node, they coalesce into one walker. 
There is only one walker after sufficiently long time, and the duality between the two stochastic 
processes guarantees Ff^ = F^ D [20]. 

Reproductive value 

In population ecology, the number of offsprings an individual contributes to is quantified as the 
reproductive value of the individual. The reproductive value of node i is defined by Eq. ([T]) 
[29,34]. In practice, a node represents a class of individuals defined by, for example, sex, age, 
or habitat. 

DeGroot model in social dynamics 

The DeGroot model [35-37] is a discrete-time model that represents the propagation of infor- 
mation or opinions in social systems. The state of the individual at node i is represented by a 
real value Pi(t); Pi(t) parameterizes the information that the individual at node i has at time 
t. The weight iWy is the probability that the individual at node j copies the opinion at node i 




(25) 
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in the next time step. The normalization is given by YliLi w n = 1- The states of the N nodes 
evolve according to 

N 

Pi(t)=^2w jiPj {t-l). (26) 

If the network is strongly connected and aperiodic, a consensus is reached asymptotically, i.e., 
Pi(oo) = . . . = p N (oo) [36,37]. 

The extent to which the initial information at node i influences the limiting common infor- 
mation in the continuous-time version of the DeGroot model is equal to V\. To show this, we 
start with the discrete-time dynamics given by Eq. (J26J). Suppose that Ff G (1 < i < N) 
satisfies pi(oo) = ... = pn(oo) = X)i=i FP G Pi(fy f° r arbitrary pi(0), ■■■ ,pjv(0). Because 
the configuration {j>i(0), . . . ,Piv(0)} and the configuration jjoi(l), . . . ,pn(1)} starting with 
{pi(0), . . . ,£>/v(0)} end up with the identical pi(oo) = . . . = p N (oo), we obtain 



N N / N \ 

E^(°) = E^ DG E^-(°) ■ 

i=l i=l \i=l / 



(27) 



Since pi(0), . . . ,Pn{0) are arbitrary, we obtain 



N 

if G = £^i? G ' (28) 
Equation (1281 is of the same form as Eq. ([!]). However, the condition Y^iLi w ij = 1 is 



imposed in Eq. (128]) because the dynamics are defined in the discrete time. The continuous- 
time counterpart of the DeGroot model is defined in [37] as follows: 

^ = E«* (Pi (*)-*(*))■ ( 29 ) 

If £>i(oo) = . . . = pat(oo) = 5^i=i -^i DG Pi(0)) we obtain £) i=1 Ff G dpi{t) / dt = 0, which leads to 

N / N N \ 

E E ^ - ^ DG E *(°) = (3°) 
i=i \j=i j=i j 

for arbitrary pi(0), . . . ,pn(0). Therefore, FP G = vi. 



Collective responses in coupled oscillator dynamics 

According to [38], consider N coupled phase oscillators obeying 

N 

<j>i = Ui + Tij - fa) + api{t), i = 1, • • ■ , N, (31) 
i=i 

where (pi E [0, 2tt) is the phase of the oscillator i, u>i is the intrinsic frequency of the oscillator 
i, Tij is the effect of node j on node i, and Pi(t) is the input at time t applied to node i. We 
assume that (i) in the absence of the input (i.e., a = 0), the system is fully phase-locked, i.e., 
<pi = 0° + Qt for all i with some constants 0° and Q and that (ii) the input is small, i.e., 
er <C 1, so that the system is always close to the phase- locked state. Using the synchronization 
condition, i.e., Ui + ^=1^ (0° — <f>j) = (1 < i < N), which is implied by assumption (i), 
we linearize Eq. (J3"lj) as 

N 

?pi = ^2 L ij^j + <7Pi(t), (32) 

where tpi = 0,, — 0° — Qt is a small perturbation in the phase, and L is the Jacobian matrix given 

by Lij = r^., (0° - 0° )] % -r^- (0° - 0°) (1 - St,). Note that the effective weight of 

the link from node j to node i is given by Wji = —TL (0° — 0°) . Because assumption (i) implies 

the stability of the phase-locked state, the real parts of all the eigenvalues of L are negative, 

except a zero eigenvalue. We define the collective phase by = YliLi v i4 ) i- Combination of 

Si=i v iLij — (1 < j < N), which is derived from Eq. (JTJ), and Eq. (132]) yields 

v v 
B = ^ = Q, + cr >^ VjPijt). (33) 

i=l i=l 

Assumption (ii) implies that 0i « . . . « 0^ ~ O. Therefore, Eq. fl33|) describes the 
dynamical behavior of each oscillator and that of the entire network. The response of the 
collective behavior to the input applied to node i is weighted by v j. 
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Appendix B: Relationship between the influence and the 
PageRank 



To determine the relationship between the influence and the PageRank, we rewrite Eq. (j2J) as 

N 



q iu ■ ■ 



R r (34) 



From the original network G, define a complete and asymmetrically weighted network G' using 
the matrix of link weights w'^ = q/N + (1 — q)wji/k° nt + (1 — q)Sij5 k out fi . Because Ylf=i w 'ji = 
(1 < i < iV), i?j in G is equal to in G', which we denote by V; (G') for clarity. Because self 
loops do not affect the calculation of the influence, we can replace w'^ by q/N + (1 — q)wji/k° ut . 

In particular, for q = is equal to (C), where G' is defined by tt^ - = Wji/k° ut . In 
this case, the PageRank and the influence are connected by the simple relationship given by 
Eq. ©. 

Appendix C: Detailed analysis of the C. elegans neural 
network 

The relative contribution of a chemical synapse and that of a gap junction to signal transduction 
in the C. elegans neural circuitry are unknown. In the main text, we have assumed that the 
neural network is a weighted network in which a chemical synapse has the same link weight 
as a gap junction. Here we examine three other variants of C. elegans neural networks. In 
these three neural networks, we neglect the link weight and/or gap junctions. The omission 
of the link weight reflects the possibility that the intensity of the communication between 
two neurons may saturate as the number of synapses increases. The omission of gap junctions 
reflects the possibility that gap junctions may not contribute to signal processing as significantly 
as chemical synapses. Note that the largest strongly connected component shrinks to a network 
of 237 nodes with 1936 synapses by the omission of gap junctions. 

For the three neural networks, the values of the PCC between the centralities of the nodes 



and the three approximators are listed in Tab. IA11 We have examined both Vi and Ri(G rev ) 
with q = 0. In general, the MA- Mod predicts and Ri(G rcv ) better than the MA in the three 
networks. The results listed in Tab. IA1I are consistent with those presented in the main text. 

For the four neural networks, including the one in the main text, the 10 most influential 
neurons are listed in Tab. IA2[ This list of 10 neurons is largely consistent across different 
definitions of neural network. For the majority of these neurons, Vi is larger than the value 
predicted from the MA. 

Appendix D: Power iteration 

If we use a standard numerical method such as the Gaussian elimination, the computation time 
required for calculating Vj, and Ri from Eqs. flTJ and ([2D, respectively, is 0(N 3 ). For sparse 
networks, carrying out power iteration (also called Jacobi iteration) may be much faster. The 
convergence of this iteration is guaranteed, as explained below for the influence. The proof for 
the PageRank is almost the same. 
We rewrite Eq. (PQ) as 

N 

Vi = J2 ^r 2 — v r ( 35 ) 

Equation (I35p indicates that Vi is the i-th element of the right eigenvector of the matrix M = 
(My) = \ Wij/ J2f=i w j'i^ f° r the eigenvalue equal to unity. Multiplying M by the diagonal 
matrix (5^/ X^i'=i w i'j) on the right and its inverse on the left does not alter the spectrum 
of M. This operation yields a new matrix whose element is given by [vjijj X^=i w^j). 
The spectral radius of the new matrix is at most unity because its maximum row sum matrix 
norm [49, p. 295] is equal to unity. Consequently, the spectral radius of M is equal to unity. 

Consider the power iteration scheme in which the (t + l)-th estimate of Vj, (1 < % < N) 
is given by the RHS of Eq. (j35[) in which the t-th estimate of Vi (1 < i < N) is substituted. 
If the network is strongly connected and aperiodic, the nonnegative matrix M is primitive, 
i.e., the eigenvalue of the largest modulus, which is equal to unity in the present case, is 
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unique [49, p. 516]. Then, the convergence of power iteration to the correct (vx, ■ ■ ■ , Ujv) is 
guaranteed [49, p. 523]. The Perron- Frobenius theorem [49] guarantees that the Perron vector 
(vx, ■ ■ ■ , Vjst) is uniquely determined and that Vi > (1 < i < N). The power iteration converges 
quickly if the modulus of the second eigenvalue of M is considerably smaller than that of the 
largest eigenvalue, i.e., unity. 

Appendix E: PCC 

The PCC between Vi and an estimator vf st , such as MA, Mod, and MA- Mod, is defined by 
Note that Zti v i/ N = Etivf/H = 1 / N - 

Appendix F: Results for unweighted email social network 

The values of the PCC between the two centrality measures and different estimators for the 
unweighted email social network are listed in Tab. IA3I The results are qualitatively the same 
as those for the weighted network shown in the main text. 



Table Al: PCC between centrality measures and different estimators for C. elegans neural 
networks. 
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0.6519 


MA-Mod (log) 
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Table A2: Most influential neurons in C. elegans neural networks. u$/(MA) indicates Vi divided 
by the value obtained from the MA. 
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Table A3: PCC between centrality measures and different estimators for unweighted email 
social network. 
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Figure 1: (a) A part of C. elegans neural network composed of the 4 largest modules. The link 
weight is equal to the sum of the number of chemical synapses and that of the gap junctions. 
The original network has 274 nodes, 2959 links, and 13 modules, while the depicted subnetwork 
has 159 nodes 1363 links. The values indicate the summed link weights from one module to 
another, (b) Approximation of intermodular connectivity by links between portal nodes. 
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Figure 2: Hierarchical multipartite network with N = 12 and P = 4. 
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Figure 3: Cumulative distribution of V( for 54 sensory neurons, 79 interneurons, and 87 motor 
neurons in C. elegans neural network. 
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Figure 4: (a) Vj, for neural network, (b) Vi for email social network, and (c) Ri for WWW with 
q = 0. The quantities placed on the horizontal axis are the MA (i.e., the normalized kf^/kf 1 
for Vi and the normalized /cf 1 for Ri) (red squares), Mod (green triangles), and MA- Mod (blue 
circles) . 
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